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Abstract. This article investigates the functional properties of complex networks used as grid computing 
systems. Complex networks following the Erdos-Renyi model and other models with a preferential attach- 
ment rule (with and without growth) or priority to the connection of isolated nodes are studied. Regular 
networks are also considered for comparison. The processing load of the parallel program executed on 
the grid is assigned to the nodes on demand, and the efficiency of the overall computation is quantified in 
terms of the parallel speedup. It is found that networks with preferential attachment allow lower computing 
efficiency than networks with uniform link attachment. At the same time, considering only node clusters 
of the same size, preferential attachment networks display better efficiencies. The regular networks, on the 
other hand, display a poor efficiency, due to their implied larger internode distances. A correlation is ob- 
served between the topological properties of the network, specialy average cluster size, and their respective 
computing efficiency. 

PACS. 89.75.Fb Structures and organization in complex systems - 89.20.Ff Computer science and technol- 
ogy - 89.20.Hh World Wide Web, Internet - 02. 10. Ox Combinatorics; graph theory - 89. 75. He Networks 
and genealogical trees 

1 Introduction ation), together with increasing computing power pp, has 

led to an unprecedented opportunity for parallel comput- 
Among the many implications of the scientific and techno- ing, allowing simulations of non-linear and complex phys- 
logical advances in microelectronics along the last decades, ical systems. More recently, the advent of the Internet 
the availability of microprocessors characterized by ever paved the way for using a network of computers to obtain 
diminishing size, cost and power consumption (per oper- 
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a very large and powerful computing system, defining the done is large in comparison with the time needed to send 

new research areas of grid computing an d parasitic a message between two processors. Thus, the effective use 

computing of grid computing remains a challenge, demanding a deli- 

» I, i ,. • . r r cate balance between computation and interprocess com- 

A parallel computing system consists of a set of pro- 

i ii i • i r • munication workloads. As a rule, the weaker the coupling 

cessmg elements connected by some kind of commumca- 

i * hi ,, i (i.e. the amount of communication needed) between dif- 

tion network. A parallel program runs on tne system by v ' 

.... . ,i i , i i . i ■ j i j ferent processing tasks, the higher the overall efficiency in 

partitioning the work to be done m several pieces that are 1 ° ' ° J 

j,i -iii i , m , a given parallel system, 

executed on the available processing elements, lo collab- at- j 

orate in the execution of the program, each piece must, as Generally, a more densely connected processing net- 

a rule, communicate with others through the communica- work favors faster data transmission, as the mean distance 

tion network. To show a good performance in the execu- between nodes tends to decrease, but at the expense of ad- 

tion of a program, a parallel system must then: (i) make a ditional communication resources. Moreover, the specific 

large number of processing elements available to the appli- kind of processing to be performed and the availability 

cation; and (ii) enable fast communications between these of the computer resources for collaborative (or parasitic) 

processing elements. computing also play an important role in defining the over- 

, all grid execution performance. 
For grid computing, the first condition can be met by 

the large amount of computers available in the Internet, The novel area of complex networks (e.g. HH]) has 

as long as their owners agree to make their processing drawn increasing interest of the physics community. A 

power available. But the second condition is difficult to lar S c P art of the success of such an approach derives from 

meet: even if high bandwidth networks are common, the the fact that such networks have been found to adhere, 

latency to deliver a message to a site geographically far at varying degrees, with important real phenomena such 

from the origin is large. The increased processing power as transmission of infectious diseases, social and ecolog- 

of microprocessors due to Moore's law (i.e. the number ical interactions, and the Internet. The fact that today 

of transistors in a chip doubles each 1.5 years) contrasts most computers are interconnected through the Internet 

with the typically slow interconnection between process- has contributed further to promote the systematic inves- 

ing elements, thus undermining the performance of paral- tigation of the Internet characteristics, a task that can 

lei systems due to the relatively large time spent to send S reatlv benefit from physical modeling approaches. 

data for processing elsewhere in comparison with the time As several features are shared by grid computing sys- 

taken to process that data. A program will only have good terns and complex networks, much can be gained through 

performance on the grid if the amount of computations integrative and comparative approaches, allowing cross- 
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fertilization between those two important areas. The un- described below. The models are undirected, reflecting the 

der lying idea in the current work is to study the effl- bidirectional transfer of packets on the Internet, 

ciency of parallel/distributed architectures whose inter- Let a complex network be represented as a graph with 

connections are defined in terms of complex network mod- n nodeSj identined as *, * = 0, . . . , n - 1, and unweighted, 

els. Such an investigation therefore focus on the Integra- un di rec ted edges represented as (i, j). The first model con- 

tion between topology and function of the networks, an sidered is the Erdos-Renyi (ER) model with a fixed num- 

important aspect of complex network research S . Regu- ber c of edges _ In this model; for each connection, two 

lar networks, which are often used in parallel computing, nodcs { and j are chosen uniformly among all the nodes 

are also considered as a reference for comparison. It is par- to establish the connection [i, j). Self-connections (conncc- 

ticularly interesting to verify how the specific properties tions of a node with itself j and dup i icate connections (con- 

of these interconnecting schemes —such as the average nec ti ns between already connected nodes) are avoided in 

vertex- vertex distance and cluster size — affect the pro- and all the following models 

cessing time and efficiency for different network configu- . ..... 

ER graphs have a fast decaying degree distribution, 

rations. Such a possible dependence between the topolog- . •. .,. r ■ 

with very small probability for nodes with high degree 

ical and functional properties of the networks is backed . ..... 

(also called hubs). To widen the degree distribution and 

by recent works which verified that the emergent features . pi.ii 

increase the probability of high degree nodes, a prejer- 

of complex networks, such as associative memory recall in 

ential attachment (PA) model is used. Networks in this 

neuronal networks, can be strongly affected by the net- _ 

model are generated as described in |9]: starting with all 

work interconnecting scheme and phase transitions [§lllUI . 

nodes without connections and choosing two nodes to con- 
In related works, the complex network paradigm has also . 

nect by drawing nodes from a list of node numbers repre- 

been explored from the perspective of search algorithms 

sented in amount proportional to their respective number 

I11II12II13I and information transfer in graphs 114111511161 . 

of connections (plus one, to account for the unconnected 
nodes). Note that this network model, although having 
2 Network Models Used preferential attachment, has no growth and so does not 

The main purpose of this article is to study the influence lead to scale-free networks (see Section EJ. 
of some network topological features in the efficiency of Hubs of larger degrees can be found in scale-free net- 
a grid system for the execution of a suitable parallel pro- works. A simple model for scale-free networks was pro- 
gram. Therefore, we will not try to use network models posed by Barabasi and Albert |17|. In their model, the 
that reproduce the characteristics of the Internet; we in- network starts with too connected nodes and grows by 
stead restrict ourselves to some simple models which are the addition of one node at a time. When a node is added, 
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m new connections from the new node to already exist- the network as new communication resources are avail- 
ing nodes are made, and each already existing node can able, instead of using them to bind already connected 
be chosen to receive a connection with probability pro- nodes. This suggests a different kind of random network 
portional to its degree k. In this model, all nodes in the construction: connecting new nodes to the network should 
network form a single large connected component; as we be given preference while there arc still isolated nodes. For 
are interested in studying the influence of percolation with that purpose, we introduce here two new models of ran- 
growing connectivity, their model is not adequate due to dom networks. The parameters for their construction are: 
the nonexistence of a percolation transition and to the im- the number of nodes n and the number of connections c. 
possibility of specifying an average connectivity that is not In the first mo del, one end of each new edge is chosen 
an even integer (the average degree is always 2m). For this with un if orm distribution among the isolated nodes and 
reason, their model is here generalized as described in the thc other cnd with um f orm distribution among all nodes 
following. In thc model used in this work, which we call (i so i atc d or not ). if at some point no more isolated nodes 
scale-free (SF) model, instead of having a fixed number of existi both ends of the remaining edges are drawn with 
connections for each new node i, a random number m, is uniform distribution among all nodes. 

chosen using a Poisson distribution with mean m, and m, r ^ , , , , r , , n . n 

° ' In the second model, one end ol thc new edge is drawn 

connections from this node to thc already existing nodes n , r e . , , , n r , , 

J ° randomly from the set of isolated nodes, as for the pre- 

are made. As some nodes may have m, = 0, the network . , , , , ,, , . , ,, , 

J vious model, but the other end is drawn among all nodes 

will have unconnected nodes; to enable these nodes to re- . in , , , , , , 

with probability proportional to thc node degrees, as m 

ceive connections with the addition of new nodes, each ,, . , ,. , „ ,. , J ; , , 

the previously discussed preferential attachment network. 

already existing node is chosen with probability propor- 



We call these networks insertion networks, because the 
connections are used to insert the nodes in the network; 
the first model is called uniform-uniform (UU) insertion 
network, and the second model uniform-preferential (UP) 
insertion network. 

For comparison, we study also three regular network 
structures common in parallel systems |18) : the hyper- 
For grid computing, as for all kinds of collaborative cube and the 2D and 3D tori. The construction of an 
work between the agents represented by a network, the hypercube with n nodes can be explained as follows. For 
binding of the agents to other agents of the network is a node i, represent the value of i in binary using [log 2 n\ 
necessity. There is, therefore, a tendency to bind nodes to bits; there is a link (i,j) iff the binary representation of 



tional do k + 1 instead of k. In this model, as m is only a 
mean value, it can be any real number (instead of only an 
integer number, as in the Barabasi- Albert model); also, 
as new nodes are not necessarily connected to the already 
existing nodes, the network consists of many connected 
components. 
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j differ in exactly one bit from that of i. For example, in where dij is the geodesic distance (distance, in number of 
an n = 8 network, node 3 (binary Oil) is connected to links, of the shortest path) between nodes i and j and the 
nodes 2 (010), 1 (001) and 7 (111)- In the bi-dimensional summation includes only pairs ij that have a path con- 
torus the nodes are distributed in a grid of size (n x ,n y ) necting them; and (c) mean cluster size s. A cluster, also 
(with n x n y — n), each node receiving a label (x, y), where know as a connected component, is a set of directly or in- 
x = \i/n y \ and y = i mod n v ; node (x,y) is connected directly connected nodes, i.e., nodes that can be reached 
with node (x 1 , y') iff x' — x and (y' = y ± 1) mod n y or from all the other nodes on the cluster by a path. To eval- 
y' = y and (x' = x ± 1) mod n x . For example, in a n = 8 uate the mean cluster size, we compute for each node of 
bi-dimensional torus organized as a 2 x 4 grid, node 3 cor- the network the number of nodes on its cluster and take 
responds to coordinates (0, 3) and is connected to nodes the average of these values for all nodes on the network. 

2 (0, 2), (0, 0), and 7 (1, 3). The three-dimensional torus Note that this average includes the largest cluster, and is 
is similarly constructed. dominated by it after percolation. To enable the compar- 
ison between the different complex networks models with 

3 Results respect to their connectivity, we use the parameter z de- 

fined as z = 2m for the SF model and z = 2c /n for the 
The results are divided in two parts. First, some results other models . In the limit of large n we have z = for 
concerning the properties of the network models described a jj hq^q^ models 

are presented. Then, results obtained by using these net- 
In the following, network characteristics are quantified 

work models as communication infrastructure for the sim- 

as averages for 50 random networks for each set of param- 

ulation of the execution of a parallel program on a grid 

eters. Error bars display the 99% confidency interval for 

are shown. 

the computed average, considering normal distribution of 
the averaged values. 

3.1 Network properties 

The degree distributions for each of the considered 
A thorough analysis of the network models described is models for n = 100000 and z = 3, compared with the 
out of the scope of this paper. Here only some properties ER model, are presented in Figure Q where Pik) is the 
of interest to the analysis of the following grid simulation cumulative probability distribution (probability of finding 
results (Section l3.2|) are presented. a node in the network with degree larger than k). The 

The topological properties of those networks are quan- UU model is almost indistinguishable from the ER model 
tified in terms of the following measures: (a) node degree in terms of degree distribution. Due to the preferential 
fc; (b) mean vertex-vertex distance i — n ^-i) ^2i>j attachment rule, the PA and UP models have broader de- 
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gree distributions. For the UP model this effect is not so occurs for these models for higher connectivities than for 
marked due to the preference given to newly added nodes the ER model. The explanation is that links that could be 
in new connections. As expected, the SF model follows a used to connect small clusters to form a larger one are be- 
power law (with a finite-size cutoff), given rise to a large ing used to link new nodes. On the other hand, the size of 
probability of nodes with high degree. the resulting largest cluster tends to be larger, as isolated 

nodes are less likely to appear. Distances in these mod- 



Fig. |3 gives the average network cluster size (s) nor- 
malized by the network size n, and Fig. shows the aver- 
age node distance (£) for the networks. These results are 
shown as functions of z comparatively to the ER model. 
As Figs. |2I a-d) show, there is an abrupt transition (a per- 
colation transition) from small to large cluster sizes as 
the connectivity grows. In the small cluster size region, 
the mean distance tends to grow with the connectivity 
(Fig. |2J), as new links result in new connections between 
previously unconnected nodes; in the large cluster region, 
increased connectivity reduces mean distance, as most of More detail about the size of clusters is given in the left 



els tend to be higher than in the ER model, because the 
same number of links is used to connect a larger number 
of nodes. For sufficiently higher connectivities, this last 
property is compensated in the UP model by the onset of 
high degree nodes (hubs), that shorten the mean distance 
between the nodes of the cluster. The presence of hubs is 
also the explanation for the smaller distances shown by 
the PA and SF models; this advantage vanishes as the 
connectivity grows. 



the nodes are already connected in the largest cluster. A side of Figs. Ufa— e), which display the cumulative proba- 

striking feature of the results for the PA model (and to bility distribution of cluster sizes (i.e. the probability P(s) 

some extent for the SF model) is the small cluster sizes that a randomly selected node is part of a cluster of size 

even for high connectivity. The reason is that, as the num- greater or equal to s) against s/n for some values of z. 

ber of nodes connected in the largest cluster grows, the These figures show that for small average connectivities 

probability of linking an unconnected node is very small, the probability of finding a large cluster is negligible. On 

due to the preferential attachment rule used to choose the the other hand, for sufficiently large average connectiv- 

ends of new links, resulting in a relatively large number ities almost all nodes are found in large clusters. The 

of isolated nodes or small clusters. In the SF model, this models without preferential attachment (ER, Fig. Ufa), 

problem is minimized by the fact that one end of each new and specially the insertion models, UU Figs. EJc) and UP 

connection always go to a new (previously unconnected) Fig.^Jd)) show a sharp transition from a regime with low 

node. The insertion (UU and UP) models show very simi- probability of large clusters and high probability of small 

lar behavior in terms of average distance and cluster size, clusters to a regime with high probability of large clusters 

The formation of a cluster spanning most of the nodes and small probability of small clusters. On the preferen- 
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Fig. 1. Degree distributions for the network models. The ER model results appear in all figures, as a reference for comparison. 
The networks have n = 100000 nodes and z — 3: (a) preferential attachment; (b) insertion with uniform probability; (b) insertion 
with preferential attachment; and (d) scale-free. 
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Fig. 2. Normalized average cluster size, for n = 1000, for the PA (a), UU (b), UP (c) and SF (d) models, compared with the 
ER model. 

tial attachment models (PA, Fig.0{b), and SF, Fig.^Je)), type, given n. The third column of Table ^ shows the 
this transition is more gradual; they also display a larger values of the average distance for these network types. 



probability of medium sized clusters before the percola- 
tion. 



3.2 Grid Simulations 



In the regular networks, all nodes are connected, and so 
(s) jn = 1. Also, the value of z is fixed for each network 



The parallel computing systems were obtained by assign- 
ing a processing unity to each network node, while mes- 
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Fig. 3. Average distance between connected pairs, for n — 1000, for the PA (a), UU (b), UP (c) and SF (d) models, compared 
with the ER model. 



Table 1. The topological and efficiency measurements for the 
three considered regular topologies (n — 1000). 



Network type 



{£) 



2D torus 4 16 ± 7 0.664 

3D torus 6 7 ± 3 0.778 

Hypercube 9.9 ± 0.4 5±2 0.816 



sages flow along the network edges. The distributed ap- 
plication considered follows the master/slave paradigm 
(also known as manager/worker or bag of tasks), where 
a master delivers processing tasks on demand to slave 
computers. The computational tasks are assumed to be 
completely independent, in the sense that each node can 
proceed without additional communications after receiv- 



ing the work packet. This arrangement is similar to many 
grid computing efforts, like SETIQhome [TQ. The com- 
putations are partitioned into M work packets (tasks), 
each requiring the same amount L of computing time. 
The communication cost is taken to correspond to the 
minimum number of edges between the master and the 
slave requesting the data. The edge communication over- 
head is therefore equal for all edges and adopted as time 
unit. Taking into account this communication cost model, 
the very small number of short cycles present in all the 
considered network models (compared with the Internet) 
does not represent an additional limitation. 

Given a network, each node i at a time is considered 
as master. The nodes that are part of the same cluster as 
the master start requesting tasks. The nodes that are not 
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Fig. 4. Cumulative probability distribution of cluster size P(s) (probability of a randomly selected node being part of a cluster 
of size s' > s) in terms of s/n and parallel efficiency P(E) (probability of a randomly selected node achieving efficiency E' > E 
when chosen as master) for the five models: (a) ER; (b) PA; (c) UU; (d) UP; and (e) SF. Distributions shown for five different 
average connectivity (z) values: z = 0.4 (O); z = 0.8 (+); z = 1.2 (□); z = 2.0 (x); and z = 5.0 (o). Network size: n = 1000; 
number of work packets M — 5000; work packet size L — 100. 
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part of the same cluster as the master cannot contribute to 
the computation because of the lack of connection to the 
master. After receiving a task from the master, the slave 
computes the result, taking time L, and sends it, together 
with a request for another task, to the master. When all M 
tasks have been delivered and their results received, the 
master terminates the execution and computes its total 
execution time Ti. Isolated nodes cannot take part on a 
distributed computation and so, when chosen as master 
nodes, their execution time is considered infinite (they will 
wait forever to receive a work request from a slave). 

To quantify the suitability of the network models for 
grid computing we compute the average speedup achieved 
by the execution of the application on the networks. The 
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speedup is defined as the ratio between sequential and Fig. 5. Parallel efficiency (E) as a function of the number M 

parallel execution times. For the problem considered, the ( a ) and size L ( b ) of work packets. Network parameters are 

parallel execution time for master node i is the value of n = 1000 and z = 3 - In ( a ) L = 100 and in ( b ) M = 500a 
Ti discussed in the previous paragraph and the sequential 



execution time is ML, so that Si 



ML 



is the speedup, tion time of the task with the time taken to send the task 



The mean speedup of a network is the mean value ~ E to the slave node and receive [t back ' that is ' L + 2d iJ • At 

Note that, for isolated nodes, as discussed, T* = oo and the same time, the number of nodes computing tasks is 
so Si = 0; as these nodes are anyway considered in the - 1 (s; is the size of the cluster of master node i and we 

average, if a network has many isolated nodes its average subtract one because the master does not compute tasks), 

speedup is low. Another equivalent measure, used in our This indicates that, for the problem considered, two im- 

results, is the normalized speedup E = S/n, also known portant network metrics are the average cluster size (s) 

as parallel efficiency. The averages (S) and (E) are then (Fig. El and average distance I between nodes with a path 

taken for 50 different random networks for each model and connecting them (Fig. [3]) . 



parameter set. 



The number of tasks M was chosen so that each node 



If master i sends a task to slave j, the time to complete has some tasks to process, but avoiding too large number 
the task (get the result back) is the sum of the computa- of tasks, because the simulation time is proportional to 
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M; the size of the tasks L was chosen to achieve compu- randomly selected node having efficiency greater or equal 
tation times larger than the average communication time to E when chosen as master node. It can be seen that 
(two times the average distance; see also Figs. GHa-d)), P(E) closely follows P(s). The main differences are that 
but small enough such that the communication time has the P{E) curves are smoother and the largest achievable 
a detectable effect. If L is too large, the average com- values of E smaller than that of s/n. The latter difference 
munication time, and then the network structure, is not is a result of the fact that a cluster with s nodes cannot 
important; if it is too small, the grid system is not a good achieve speedup S, because the master node does not corn- 
choice for the execution of the application. Fig.OJa) shows pute and the communication costs increase the execution 
the dependence of average efficiency (E) with the number time of each task (in comparsion to sequential execution); 
of tasks M (for fixed L = 100, n = 1000, and 2 = 3) for the former difference is due to the fact that different clus- 
the five models. If M is small, there will be not enough ters with the same number of nodes s have different inter- 
computational work to distribute evenly among the nodes, connection topology, resulting in different communication 
resulting in very small efficiencies; after some value of M, costs. 

the efficiency is not much affected by a further increase « -i i i_ j r 

J J As a consequence, similar conclusions can be drawn tor 

of M. The relation of the efficiency with L is very simi- , , rc ■ c ■ j , ■ i r ^1 

J J the efficiency of a grid computing network as tor the mean 

lar: a larger value o L helps reduce the importance of the , r , , n , , „ . . 

° cluster sizes of the corresponding network model. Partic- 

communication costs, increasing the efficiency; Fig. ISTb) , , , ,. c rn ■ 

' ° J ' ° ' ularly, we note a percolation transition of the efficiency 

plots this relation (for M — 5000, n = 1000, and z = 3). . . ■ ., <■ c u 

r v witn increasing connectivity from a regime of very small 

It can be seen that after L w 100 there is no significant ^ . . r , . , ^ . m, . 

° etnciency to a regime ot fiign emciency. 1ms transition 

increase in efficiency for larger L. The following results . , , c ^ , , , c TTTT 

° ° is more abrupt for the FK and particularly for the UU 

assume M = 5000 and L = 100; results for other values , Tjn , , , n c ^ r>\ jcu 

' and UP models, and somewhat slower for the PA and bl 

show no qualitative differences. , , mi rc ■ c 4 i. j i .,1 r , • i 

models. Ine emciency ot tne models with preferential at- 

Figs.|nia-d) present the average parallel efficiency as a tachment is restricted due to the already discussed higher 

function the z for M = 5000 and L — 100. As can be easily number of isolated nodes, that contribute to a speedup of 

seen by comparing Figs. |2a-d) andl^a-d), the average zero to the average speedup. The insertion models need 

parallel efficiency tends to closely reflect the normalized a higher connectivity to reach the percolation point, but 

average cluster size for all considered models. after that show a higher efficiency, demonstrating the ad- 

ml . . „ , . . . , , , . r , , vantage of using network resources to connect new nodes, 

f his is further substantiated by a comparison of the ° ° 

left and right graphs of Figs. HJa— e). The right graphs The strong correlation between the cluster size and 

show the cumulative probability distribution P{E) of a parallel efficiency is further substantiated in Fig. where 
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Fig. 6. Average parallel efficiency for n = 1000, M = 5000, L = 100, for the PA (a), UU (b), UP (c) and SF (d) models, 
compared with the ER model. 



a scatter plot of efficiency against normalized cluster size 
is shown, that demonstrates the almost linear correlation 
between efficiency and normalized cluster sizes. Another 
interesting result observed from this figure is that the PA 
and SF networks tend to provide slightly better efficiencies 
than the other models. In other words, if clusters of sim- 
ilar size are considered, the presence of hubs that "short- 
circuit" the distances tends to enhance the speed-up of 
the computations in the networks. 

In order to further investigate such a possible effect 
of the intrinsic connectivity properties of the considered 
models over the respective performance, the largest clus- 
ters of the four models were considered in isolation in the 
grid simulations. That is, networks with the same con- 
nectivity were generated and only their largest clusters 
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Fig. 7. Scatter plot of efficiency against cluster size, for n = 
1000, M = 5000 and L = 100. 

where considered. Special care was taken so as to obtain 
such connected clusters with equivalent number of nodes 
(about 1000 nodes). The results are shown in Figure|S{a- 
d), where the efficiency of the parallel execution on the 
largest clusters, (largest), are plotted. As we are consid- 
ering here only the largest clusters, the efficiency is com- 
puted with respect to the number of nodes of the cluster, 
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and not of the whole network. The results indicate a def- 
inite tendency of the SF model, and to a lesser extent of 
the PA model, to outperform the others. Such an effect is 
possibly a consequence of the shorter average lengths usu- 
ally observed for this model — see Figure^ a-d) — and the 
presence of hubs which act as message distribution nodes. 

The results obtained for the regular topologies (hyper- 
cube and 2D and 3D torus) for n = 1000 are shown in Ta- 
ble ^ Interestingly, the 2D and 3D torus regular architec- 
tures led to rather low efficiencies, despite their relatively 
large node degrees, reflecting the respective large average 
distances, due to the absence of shortcut links in the reg- 
ular structure of these networks. The hypercube topology 
allowed efficiencies comparable to the maximum obtained 
for the network models, but at the expense of almost 10 
connections per node, which implies a high network cost. 

4 Conclusions 

While the scale-free and preferential attachment models 
allowed better efficiency considering only the largest clus- 
ter, the Erdos-Renyi model tended to provide better aver- 
age speedup when all clusters where condered, as a conse- 
quence of the smaller number of isolated clusters implied 
by this type of network. The insertion models resulted in 
even better efficiencies, due to the inclusion of more nodes 
in the largest cluster after percolation. The random mod- 
els had better efficiencies than the regular ones, due to the 
implied smaller average distances. 

The results show that a network is of little use for grid 
computing before the percolation point is reached, that is, 
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for values of z before the formation of a cluster spanning 
most of the nodes, because of the very small resulting ef- 
ficiencies. In other words, the percolation of the network 
used as a grid computing resource is of fundamental im- 
portance to the utility of the grid. Although the Inter- 
net already connects a very large number of computers, 
the use of these computers for grid computing is subject 
mainly to two limitations: a consent from the part of their 
owner and the installation of a grid computing software 
on them. It is therefore appropriate to consider the grid 
network as distinct from the Internet, and two computers 
in the Internet as connected in the grid network only if 
their owners have given permission to use and installed 
an appropriate set of protocols and compatible software. 
In this aspect, the obtained results motivate the grid com- 
munity to achieve convergence in protocols and software, 
as the presence of many incompatible software platforms 
represent the presence of unconnected clusters of nodes in 
the network. This conclusion is related with the efficiency 
of the network as a whole for grid computing; for users in 
isolated clusters, the execution of an application in these 
clusters can be nevertheless of interest, if enough speedup 
is achieved. 

Future work should extend the analysis to other types 
of distributed systems and applications. Network mod- 
els closer to some real Internet characteristics, as in |20| . 
should be considered. The inclusion of measures to quan- 
tify the load of intermediate nodes in packet transmission 
|21j . like betweenness centrality, can improve the results, 
altough centrality is non-trivially related to traffic flow if 
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Fig. 8. Efficiency of the largest clusters (Biargest), for the PA (a), UU (b), UP (c) and SF (d) models, compared with the ER 
model. In all cases M — 5000 and L = 100 and the size of the networks was chosen to have a largest cluster of about 1000 
nodes. 



congestion is considered [22]. If the assumption of com- 
munication through shortest paths (that requires global 
knowledge) is relaxed, the use of local search algorithms 
|12| will result in stronger dependence of efficiency with 
the network structure. A further refinement in this direc- 
tion is to consider queuing of packets on the nodes and 
congestion. Recent works |15U16] have shown that trans- 
mission times are in this case not directly related to short- 
est distances, but have a much richer behavior, depending 
also on the total communication load carried by the net- 
work. 

Another important generalization is to consider corn- 



in which the weight of the links reflect the bandwidth or 
inverse latency of the interconnecting links. Also the nodes 
can display different processing powers. These generaliza- 
tions make the network models closer to real interconnec- 
tion networks. 

The generalization of the parallel application model to 
include communications between the tasks is of interest, 
expanding the classes of applications modeled and due to 
the importance of the network topology to the efficiency of 
these communications, resulting in an interesting interplay 
between network and application characteristics. 

We conjecture that even with the generalization of 



plex networks presenting links with different communica- the network, routing and application models, as suggested 
tion speeds. This can be modeled using weighted networks, above, the efficiency will remain strongly related with clus- 
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ter size, altough the correlation with shortest distance may 
be reduced, and other network features may increase their 
importance, resulting in stronger influence of the network 
model used. 
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